166 research outputs found
Stroke-based sketched symbol reconstruction and segmentation
Hand-drawn objects usually consist of multiple semantically meaningful parts.
For example, a stick figure consists of a head, a torso, and pairs of legs and
arms. Efficient and accurate identification of these subparts promises to
significantly improve algorithms for stylization, deformation, morphing and
animation of 2D drawings. In this paper, we propose a neural network model that
segments symbols into stroke-level components. Our segmentation framework has
two main elements: a fixed feature extractor and a Multilayer Perceptron (MLP)
network that identifies a component based on the feature. As the feature
extractor we utilize an encoder of a stroke-rnn, which is our newly proposed
generative Variational Auto-Encoder (VAE) model that reconstructs symbols on a
stroke by stroke basis. Experiments show that a single encoder could be reused
for segmenting multiple categories of sketched symbols with negligible effects
on segmentation accuracies. Our segmentation scores surpass existing
methodologies on an available small state of the art dataset. Moreover,
extensive evaluations on our newly annotated big dataset demonstrate that our
framework obtains significantly better accuracies as compared to baseline
models. We release the dataset to the community
Multimodal Group Activity Dataset for Classroom Engagement Level Prediction
We collected a new dataset that includes approximately eight hours of
audiovisual recordings of a group of students and their self-evaluation scores
for classroom engagement. The dataset and data analysis scripts are available
on our open-source repository. We developed baseline face-based and
group-activity-based image and video recognition models. Our image models yield
45-85% test accuracy with face-area inputs on person-based classification task.
Our video models achieved up to 71% test accuracy on group-level prediction
using group activity video inputs. In this technical report, we shared the
details of our end-to-end human-centered engagement analysis pipeline from data
collection to model development
Feature Point Detection and Curve Approximation for Early Processing of Freehand Sketches
Freehand sketching is both a natural and crucial part of design, yet is unsupported by current design automation software. We are working to combine the flexibility and ease of use of paper and pencil with the processing power of a computer to produce a design environment that feels as natural as paper, yet is considerably smarter. One of the most basic steps in accomplishing this is converting the original digitized pen strokes in the sketch into the intended geometric objects using feature point detection and approximation. We demonstrate how multiple sources of information can be combined for feature detection in strokes and apply this technique using two approaches to signal processing, one using simple average based thresholding and a second using scale space
Towards Building Child-Centered Machine Learning Pipelines: Use Cases from K-12 and Higher-Education
Researchers and policy-makers have started creating frameworks and guidelines
for building machine-learning (ML) pipelines with a human-centered lens.
Machine Learning pipelines stand for all the necessary steps to develop ML
systems (e.g., developing a predictive keyboard). On the other hand, a
child-centered focus in developing ML systems has been recently gaining
interest as children are becoming users of these products. These efforts
dominantly focus on children's interaction with ML-based systems. However, from
our experience, ML pipelines are yet to be adapted using a child-centered lens.
In this paper, we list the questions we ask ourselves in adapting
human-centered ML pipelines to child-centered ones. We also summarize two case
studies of building end-to-end ML pipelines for children's products
Identity-Aware Semi-Supervised Learning for Comic Character Re-Identification
Character re-identification, recognizing characters consistently across
different panels in comics, presents significant challenges due to limited
annotated data and complex variations in character appearances. To tackle this
issue, we introduce a robust semi-supervised framework that combines metric
learning with a novel 'Identity-Aware' self-supervision method by contrastive
learning of face and body pairs of characters. Our approach involves processing
both facial and bodily features within a unified network architecture,
facilitating the extraction of identity-aligned character embeddings that
capture individual identities while preserving the effectiveness of face and
body features. This integrated character representation enhances feature
extraction and improves character re-identification compared to
re-identification by face or body independently, offering a parameter-efficient
solution. By extensively validating our method using in-series and inter-series
evaluation metrics, we demonstrate its effectiveness in consistently
re-identifying comic characters. Compared to existing methods, our approach not
only addresses the challenge of character re-identification but also serves as
a foundation for downstream tasks since it can produce character embeddings
without restrictions of face and body availability, enriching the comprehension
of comic books. In our experiments, we leverage two newly curated datasets: the
'Comic Character Instances Dataset', comprising over a million character
instances and the 'Comic Sequence Identity Dataset', containing annotations of
identities within more than 3000 sets of four consecutive comic panels that we
collected.Comment: 18 pages, 9 Figure
Approximate solutions for nonlinear oscillation of a mass attached to a stretched elastic wire
AbstractIn this paper, the approximate solutions of the mathematical model of a mass attached to a stretched elastic wire are presented. At the beginning of the study, the equation of motion is derived in a detailed way. He’s max–min approach, He’s frequency–amplitude method and the parameter-expansion method are implemented to solve the established model. The numerical results are further compared with the approximate analytical solutions for both a small and large amplitude of oscillations, and a very good agreement is observed. The relative errors are computed to illustrate the strength of agreement between the numerical and approximate analytical results
Domain-Adaptive Self-Supervised Pre-Training for Face & Body Detection in Drawings
Drawings are powerful means of pictorial abstraction and communication.
Understanding diverse forms of drawings, including digital arts, cartoons, and
comics, has been a major problem of interest for the computer vision and
computer graphics communities. Although there are large amounts of digitized
drawings from comic books and cartoons, they contain vast stylistic variations,
which necessitate expensive manual labeling for training domain-specific
recognizers. In this work, we show how self-supervised learning, based on a
teacher-student network with a modified student network update design, can be
used to build face and body detectors. Our setup allows exploiting large
amounts of unlabeled data from the target domain when labels are provided for
only a small subset of it. We further demonstrate that style transfer can be
incorporated into our learning pipeline to bootstrap detectors using a vast
amount of out-of-domain labeled images from natural images (i.e., images from
the real world). Our combined architecture yields detectors with
state-of-the-art (SOTA) and near-SOTA performance using minimal annotation
effort.Comment: Preprint, 8 pages of the paper itself + 7 pages of Supplementary
Material. Includes 8 figures and 7 table
Sketch interpretation using multiscale stochastic models of temporal patterns
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 102-114).Sketching is a natural mode of interaction used in a variety of settings. For example, people sketch during early design and brainstorming sessions to guide the thought process; when we communicate certain ideas, we use sketching as an additional modality to convey ideas that can not be put in words. The emergence of hardware such as PDAs and Tablet PCs has enabled capturing freehand sketches, enabling the routine use of sketching as an additional human-computer interaction modality. But despite the availability of pen based information capture hardware, relatively little effort has been put into developing software capable of understanding and reasoning about sketches. To date, most approaches to sketch recognition have treated sketches as images (i.e., static finished products) and have applied vision algorithms for recognition. However, unlike images, sketches are produced incrementally and interactively, one stroke at a time and their processing should take advantage of this. This thesis explores ways of doing sketch recognition by extracting as much information as possible from temporal patterns that appear during sketching.(cont.) We present a sketch recognition framework based on hierarchical statistical models of temporal patterns. We show that in certain domains, stroke orderings used in the course of drawing individual objects contain temporal patterns that can aid recognition. We build on this work to show how sketch recognition systems can use knowledge of both common stroke orderings and common object orderings. We describe a statistical framework based on Dynamic Bayesian Networks that can learn temporal models of object-level and stroke-level patterns for recognition. Our framework supports multi-object strokes, multi-stroke objects, and allows interspersed drawing of objects - relaxing the assumption that objects are drawn one at a time. Our system also supports real-valued feature representations using a numerically stable recognition algorithm. We present recognition results for hand-drawn electronic circuit diagrams. The results show that modeling temporal patterns at multiple scales provides a significant increase in correct recognition rates, with no added computational penalties.by Tevfik Metin Sezgin.Ph.D
- …